Generalization in Native Language Identification: Learners versus Scientists
نویسنده
چکیده
English. Native Language Identification (NLI) is the task of recognizing an author’s native language from text in another language. In this paper, we consider three English learner corpora and one new, presumably more difficult, scientific corpus. We find that the scientific corpus is only about as hard to model as a less-controlled learner corpus, but cannot profit as much from corpus combination via domain adaptation. We show that this is related to an inherent topic bias in the scientific corpus: researchers from different countries tend to work on different topics. Italiano. La Native Language Identification (NLI) permette di riconoscere la lingua madre di un autore utilizzando il testo scritto in un’ altra lingua. In questo lavoro utilizziamo tre collezioni di testi prodotti da apprendenti di inglese e un nuovo corpus scientifico, presumibilmente più difficile. In realtà, il corpus scientifico risulta essere difficile da modellare quanto un corpus di apprendimento meno controllato; tuttavia, a differenza di questi, esso non beneficia della combinazione di diversi corpora con metodi di domain adaptation. Questo limite è legato ad un’intrinseca specializzazione degli argomenti del corpus scientifico: ricercatori di paesi diversi tendono a trattare argomenti diversi.
منابع مشابه
Attitudes towards English Language Norms in the Expanding Circle: Development and Validation of a new Model and Questionnaire
This paper describes the development and validation of a new model and questionnaire to measure Iranian English as a foreign language learners’ attitudes towards the use of native versus non-native English language norms. Based on a comprehensive review of the related literature and interviews with domain experts, five factors were identified. A draft version of a questionnaire based on those f...
متن کاملNative Language Interference in Writing: A case study of Thai EFL learners
AbstractThe interference of the native language in acquiring a foreign language is unavoidable. In an attempt to explore the phenomenon why this occurs, the study was conducted in English as a foreign language writing. The study also investigated how the native language interference occurred in the writing process. In fact, this qualitative study explored the reasons and the process of na...
متن کاملNative Language Interference in Writing: A case study of Thai EFL learners
AbstractThe interference of the native language in acquiring a foreign language is unavoidable. In an attempt to explore the phenomenon why this occurs, the study was conducted in English as a foreign language writing. The study also investigated how the native language interference occurred in the writing process. In fact, this qualitative study explored the reasons and the process of na...
متن کاملTHE EFFECT OF STANDARD AND REVERSED SUBTITLING VERSUS NO SUBTITLING MODE ON L2 VOCABULARY LEARNING
Audiovisual material accompanied by interlingual subtitles is a powerful pedagogical tool which can help improve the vocabulary learning of second-language learners. This study was intended to determine whether or not the mode (standard and reversed) of subtitling affects the incidental vocabulary acquisition of Iranian L2 learners while watching TV programs. Forty-five participants were random...
متن کاملAcoustic Analysis of Persian EFL Learners' Pronunciation of English Vowels
This paper reports the results of an experimental study on non-native production of English vowels. Two groups of Persian EFL learners varying in language proficiency were tested on their ability to produce the nine plain vowels of American English. Vowel production accuracy was assessed by means of acoustic measurements. Ladefoged and Maddison’s (1996) F1 F2 measurements for American English v...
متن کامل